Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26680][SQL] Eagerly create inputVars while conditions are appropriate #23617

Closed
wants to merge 2 commits into from

Conversation

bersprockets
Copy link
Contributor

What changes were proposed in this pull request?

When a user passes a Stream to groupBy, CodegenSupport.consume ends up lazily generating inputVars from a Stream, since the field output will be a Stream. At the time output.zipWithIndex.map is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created inputVars. As a result, a StackOverflowError occurs.

This PR ensures that inputVars is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating inputVars from outputVars (SPARK-25767). I simply extended the solution for that code path to encompass both code paths.

How was this patch tested?

SQL unit tests
new test
python tests

Fix scope of match

Add test
@@ -159,6 +156,10 @@ trait CodegenSupport extends SparkPlan {
BoundReference(i, attr.dataType, attr.nullable).genCode(ctx)
}
}
} match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this construction is confusing. Can this be split into two statements rather than chain a block and match?

@SparkQA
Copy link

SparkQA commented Jan 22, 2019

Test build #101544 has finished for PR 23617 at commit 367b6e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Jan 22, 2019

aside from sean's comments, lgtm. Also cc @hvanhovell since you reviewed #22789 for SPARK-25767

@SparkQA
Copy link

SparkQA commented Jan 22, 2019

Test build #101553 has finished for PR 23617 at commit 5f27005.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asfgit asfgit closed this in d4a30fa Jan 24, 2019
asfgit pushed a commit that referenced this pull request Jan 24, 2019
…opriate

## What changes were proposed in this pull request?

When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs.

This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths.

## How was this patch tested?

SQL unit tests
new test
python tests

Closes #23617 from bersprockets/SPARK-26680_opt1.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
(cherry picked from commit d4a30fa)
Signed-off-by: Herman van Hovell <[email protected]>
@hvanhovell
Copy link
Contributor

Merged to master/2.4. If we need this in 2.3 as well, then we need to make a backport that fixes the conflicts.

bersprockets added a commit to bersprockets/spark that referenced this pull request Jan 24, 2019
…opriate

When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs.

This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths.

SQL unit tests
new test
python tests

Closes apache#23617 from bersprockets/SPARK-26680_opt1.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
@bersprockets
Copy link
Contributor Author

@hvanhovell @srowen @squito Thanks! I will make a PR for branch-2.3 later today.

dongjoon-hyun pushed a commit that referenced this pull request Jan 25, 2019
…s while conditions are appropriate

## What changes were proposed in this pull request?

Back port of #22789 and #23617 to branch-2.3

When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs.

This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths.

## How was this patch tested?

SQL unit tests
new test
python tests

Closes #23642 from bersprockets/SPARK-26680_branch23.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@bersprockets bersprockets deleted the SPARK-26680_opt1 branch January 31, 2019 18:40
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…opriate

## What changes were proposed in this pull request?

When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs.

This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths.

## How was this patch tested?

SQL unit tests
new test
python tests

Closes apache#23617 from bersprockets/SPARK-26680_opt1.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…opriate

## What changes were proposed in this pull request?

When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs.

This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths.

## How was this patch tested?

SQL unit tests
new test
python tests

Closes apache#23617 from bersprockets/SPARK-26680_opt1.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
(cherry picked from commit d4a30fa)
Signed-off-by: Herman van Hovell <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…opriate

## What changes were proposed in this pull request?

When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs.

This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths.

## How was this patch tested?

SQL unit tests
new test
python tests

Closes apache#23617 from bersprockets/SPARK-26680_opt1.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
(cherry picked from commit d4a30fa)
Signed-off-by: Herman van Hovell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants